Common Software-Related Metrics

Learn about the categories of software development metrics, such as process, reliability, runtime performance, security, maintainability, and responsiveness.

The best metrics will be those that are developed based on your team’s and your company’s particular situation, and no one metric will give you perfect information or insight into your team. That said, a number of different metrics have emerged over time as good ones to begin with, and can often serve as a “starting point” for further drill-down or investigation.

Metrics often fall into a variety of different categories:

  • Process. Lead time, cycle time, PR-to-review, and so on.
  • Reliability. Production outages/incidents, average failure rate, load testing, mean-time-to-recovery, time-to-discovery, and so on.
  • Runtime performance. User transactions per minute (or second), simultaneous user sessions, stress testing, soak testing, application performance monitoring, and so on.
  • Security. Number of vulnerabilities discovered, time to resolution, time to production deployment of updates, number and severity of security incidents.
  • Maintainability/Code quality. Static code analysis numbers, code complexity, lines of code, and so on.
  • Responsiveness. Response time to incident discovery, elapsed time to customer bug report.

One thing to keep in mind is that some metrics are measured against the team as a whole, and others are measured against individuals (and some can be applied to either or both). As we’ve discussed in the last section on relationships, psychological safety is paramount when creating a highly-functional team, and being careless here can undo all that effort. When talking about metrics with the team, make it clear which are team aggregated, and which are individual, so that everybody understands what you’re looking for and why. If you measure a given metric simultaneously at both the team level and the individual level, you run the risk of dividing your team: team members will be able to look at other teammates’ numbers and identify “who’s holding us back,” which can create that negative peer pressure. When examining metrics, keep it clear in your head whether you are looking to measure the team as a whole, or individual effort within the team. You need both.

Additionally, be wary of metrics that are focused on “output” as opposed to “outcomes.” It can be very easy to lose track of outcomes—the ultimate end result of the work being done–in the face of the easy metrics generated by your employees’ actions, such as the number of lines of code being written. An old proverb holds that “one should never mistake motion for progress”—a bird can often remain aloft for hours by riding thermal air columns without having to beat its wings once, while the same bird can be flapping its wings for all its worth and make no headway if it is flying headfirst into a hurricane. “Motion,” the number of wingbeats per minute, does not always translate directly into “progress,” or the distance the bird gets to travel. As you gain experience with your team, your company, and your industry, you will find it easier to spot those metrics which are counting wingbeats, and those which are tracking the distance covered. (Better yet, measure both, so that you can tell the bird to land during the hurricane!)

widget

Let's go through some of the more common metrics.

Velocity#

One of the classic agile-inspired metrics is developer velocity: Assuming that a developer is asked to estimate how long a particular body of code will take to write, we then compare the actual amount of time it took and obtain a ratio—that developer's "velocity"—to use in future estimates. For example, if I think it'll take me 16 hours, and it ends up taking 32, then my velocity is 0.5 and we can safely double all of my future estimates to obtain a more "realistic" value.

Velocity is one of those metrics that tends to draw fire. Some managers and developers hate it, pointing out that it's difficult to measure the amount of time actually spent on a feature. (If I think about it in the shower, does that count as time spent working on the code? What about if I'm coding on it while listening to the company all-hands?) This leads to attempts to talk about "ideal hours" as opposed to "actual hours," with the suggestion that after meetings and other day-to-day activities, developers only get maybe four "ideal hours" of coding work done per day.

For the most part, velocity seems to have decreased in popularity as a performance metric, though it may still crop up from time to time during up-front estimation activities. I find it useful to have in mind during estimation, but I generally don't track it formally since it is too easy to abuse.

"Burn-down"#

This is another classic metric of the agile community, in which the team tracks the number of stories or tasks to accomplish before the release. In theory, when the burndown chart reaches 0, you ship the release. Related metrics track the number of stories, tasks, or "story points" the team has accomplished per sprint, as another measurement of velocity (above).

Schedule delivery#

One of the oldest metrics in the industry is the most basic one: "Did you hit the deadline?" Whether this is measuring per-developer-per-task or the software application as a whole (across the entire department or team), it's often a cited benchmark when discussing team performance. Inevitably, questions arise as to who set the deadline (did the developers do the estimate for the task, or was the ship date for the application set by the Marketing team, and all the infinite variations in between), and how that deadline came to be. So, as a performance measurement, it is usually a very coarse-grained one and extremely "laggy" (meaning it tends to show its results very late). Being one of the most obvious, it can be a useful starting point, but this should be a team-measured metric, and taken with a great deal of context and awareness of surrounding circumstances. (In many respects, this is just a variation of velocity, above, but writ at a larger scale.)

Lead time, cycle time, and other "X" time#

Lead time measures how much time passes between task creation and work completion. It can provide a larger perspective on how long it takes for a client’s requested feature to be completed.

Cycle time measures the time spent on starting and completing a task. It is often composed of several Agile process metrics corresponding to a stage in the software development process, such as coding, pickup, review, and deploy:

  • PR to First Commit. Time elapsed from the first commit to creating a pull request.

  • Time to First Review. Time spent between an engineer opening a PR and the PR being reviewed. As a team metric, it measures how fast reviewers pick up their peers’ PRs for review.

  • Time to Merge from First Review. The time spent between a PR’s first review and that PR being merged, intending to measure how fast submitters implement their peers’ feedback.

  • Time to Deploy from Merge. The time elapsed from the moment a PR is merged to the time of its deployment into production.

It is often common to measure the time required to go from "start to finish," where "finish" is often considered "deployment to production." As can be seen above, starting points vary—from the time the story is put into the backlog, from the time the story is accepted by the team to work on in this sprint, from the time the story is first conceived as part of a larger epic—the starting point of the timing measurement varies.

Often this metric is desired as a means by which to aid estimates for future work—if we know how long it took us to ship feature X, then we can better estimate how long it will take us to ship similarly-sized feature Y.

While intended as a measure of how quickly a developer or team can bring something from "ideation to execution," this metric is a coarse-grained one, and can often be incorrect due to a variety of non-developer factors: hotfixes that were required during the sprint, management re-prioritization of work, or stories/features/epics are discovered to be incomplete or unactionable after being handed off to the development team. In other scenarios, simple human factors such as PTO can skew the metric.

Aggregate measurements#

The other way to measure the development of a project is to look at the project's metrics collectively: the number of commits per day, the relative size of each commit (the "volume" of all the commits), a percentage impact of commits against the total size of the codebase, and so on.

This can sometimes be useful to gain some context about the scope of an impact on an existing codebase—if, for example, the team is working to add a feature to a several-million-LOC codebase, and their total volume of commits is 50k LOC and clearly scoped to just a small percentage of the number of files in the repository, it's safe to conclude that the team's impact on that codebase should be pretty small. (However, never discount the possibility that the codebase is deeply entangled and small changes in any corner could have wide-ranging impacts.) This, then, can be a useful context to have in mind when defect rates spike, particularly if bugs emerge in parts of the codebase that the team didn't work on.

Aggregate measurements can be useful to set some team metrics, but keep in mind that measuring commits-per-day can lead developers to feel pressured to do smaller, more frequent commits just to inflate that number. If you use this as a metric, make sure to also keep track of how many team members were working that day—nothing sucks more as a member of the team than to watch three of your six peers go on summer vacation and knowing your metrics are going to tank because of it.

Code churn#

One of the favorites is "code churn": for a given body of code (often a given feature), how much code needs to be rewritten before that body achieves some level of stability or acceptance? This can be expressed in a variety of ways (code churn per developer or per team, per period of time, per feature or sub-feature or story), and keep in mind, although it may seem up front like an indictment of a particular developer ("Franz here rewrote the same line of code 47 times, he must be terrible! Jacques over there only had to write that other line of code once!"), code churn can come from a variety of sources beyond the individual developer: was the feature well understood, was the story well written, was the product owner available for questions that came up, was there new technology involved around this body of code, and so on.

All things being equal, code churn can help to identify which developers are stronger on the team than others, but things are only equal when the relative complexity of what each developer is working on is also equal. The opposite metric to code churn, by the way, is often called "productive throughput."

Bug delta#

Assuming there is a bug tracking system, and assuming there is a team dedicated to finding and identifying bugs, we can track the number of new incoming bugs against the number of bug fixes—if the difference between "new" and "fixed" is getting smaller, then the system is assumed to be getting more stable, and if the difference is getting larger, then assumptions are that things are growing less stable and/or more fragile.

Any metric around "bugs," however, begs the question, "What is a bug?" And that question has brought developers and QA teams to more fights than any other as QA marks something as a "bug" and development closes it with "Works as intended" or the oft-quoted line "It's a feature." Caveat emptor: "Let the buyer beware."

Code coverage#

The comprehensiveness of unit tests can be measured by using code-injection tools to measure what percentage of the code-under-test is actually executed. If a unit test suite covers 100% of the code, that means that every line in the code-under-test is executed at some point. This can be a useful metric to measure, but it runs a few risks: first, the amount of test code required to reach that metric can sometimes be extensive; second, merely executing the line doesn't always "test" it effectively; and third, it's still possible for code to fail with 100% code coverage.

Response times#

With the rise of DevOps in the software development organization, some of the traditional "operations" metrics become feasible for the team. One common one is "Mean Time Between Failures," alongside "Mean Time Before Response," which measure the average (arithmetic mean, more precisely) time between tech incidents in production and the average time before a discovered incident is fixed/corrected (either by production configuration adjustment or a code fix, depending on the issue). Remember, if the team is engaged in the full DevOps experience, the team is now responsible for the runtime operation of the software they build, and it is reasonable to hold them accountable to their response time to "something happening."

Common Business-Centric Metrics Systems

Advice for When You Apply These Ideas